Real Time Web Text Classification and Analysis of Reading Difficulty
نویسندگان
چکیده
The automatic analysis and categorization of web text has witnessed a booming interest due to the increased text availability of different formats, content, genre and authorship. We present a new tool that searches the web and performs in real-time a) html-free text extraction, b) classification for thematic content and c) evaluation of expected reading difficulty. This tool will be useful to adolescent and adult low-level reading students who face, among other challenges, a troubling lack of reading material for their age, interests and reading level.
منابع مشابه
Task Difficulty and Its Components: Are They Alike or Different across Different Macro-genres?
Task difficulty across different macro-genres continues to remain among less attended areas in second language development studies. This study examined the correlation between task difficulty across the descriptive, narrative, argumentative, and expository macro-genres. The three components of task difficulty (i.e., code complexity, cognitive complexity, and communicative stress) were also comp...
متن کاملطراحی وب سرویس مدیریت امدادرسانی پس از وقوع سیل با کمک اطلاعات جغرافیایی داوطلبانه (VGI) بر مبنای تکنولوژی متن باز
Accessibility to precise spatial and real time data plays a valuable role in the velocity and quality of flood relief operation and subsequently, scales the human and financial losses down. Flood real time data collection and processing, for instance, precise location and situation of flood victims may be a big challenge in Iran regarding the hardware facilities (such as high resolution aerial ...
متن کاملTEXTUAL AND INTER-TEXTUAL ANALYSES OF IRANIAN EFL UNDERGRADUATES’ TYPES OF ENGLISH READING TOWARDS DEVELOPING A CAREFUL READING FRAMEWORK
This study investigated textual and inter-textual reading of a group of Iranian EFL undergraduates’ careful English reading types. In this research, Khalifa and Weir’s (2009) reading framework was used to propose a more inclusive aspect of a careful reading framework and the reading construct for instructional and assessment goals. The participants of this study were B.A. students of English Tr...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملCloze validation against IELTS Reading Paper: Doubts on correlational validation
Cloze was officially introduced in a journal on Journalism as a technique for estimating text readability and as "a new psychological tool for measuring the effectiveness of communication" (Taylor, 1953: 415). Different varieties of cloze have since been developed and experimented upon as measures of such diverse traits as reading comprehension and language proficiency. The findings of numerous...
متن کامل